Skip to content

Fix collinearity inflation for ordinal models#903

Open
jmgirard wants to merge 11 commits intoeasystats:mainfrom
jmgirard:jmgirard/issue900
Open

Fix collinearity inflation for ordinal models#903
jmgirard wants to merge 11 commits intoeasystats:mainfrom
jmgirard:jmgirard/issue900

Conversation

@jmgirard
Copy link
Copy Markdown
Contributor

Fixes #900

What this PR does

This PR resolves an issue where check_collinearity() calculated artificially inflated VIFs for ordinal models (clm and clmm from the ordinal package).

Why the bug occurred

Previously, check_collinearity() pulled the full variance-covariance matrix. For clmm models, this matrix includes all threshold estimates (which act as multiple intercepts) and random effect variances. Leaving these structural parameters in the matrix artificially inflated the VIFs for the actual fixed predictors.

For example, using the reproducible example in #900, the calculated VIFs were artificially inflated to 4.36 instead of accurately reflecting the minor covariance induced between parameter estimates.

How it is fixed

  • Added a specific condition for clm and clmm objects to subset the variance-covariance matrix down to only the true slope parameters, using names(x$beta).
  • Synchronized the subsetting of the internal term_assign tracking vector to prevent matrix indexing errors later in the function.
  • Added drop = FALSE when subsetting to prevent dimension collapse.

When tested against the reproducible example in the original post, the calculated VIFs dropped from 4.36 down to 1.12, correctly reflecting the minor covariance within the ordinal likelihood space.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the intercept-handling logic in check_collinearity.R to support ordinal models like clm and clmm. The changes introduce a more flexible method for identifying slope parameters and subsetting the variance-covariance matrix. Feedback suggests adding a safety check to ensure term_assign is synchronized with the matrix dimensions before subsetting to avoid NA values. Additionally, it was recommended to use insight::find_parameters() for better consistency with the easystats ecosystem.

Comment thread R/check_collinearity.R
Comment thread R/check_collinearity.R
@jmgirard
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the collinearity check for ordinal models (clm and clmm) by filtering for true slope parameters rather than assuming a single intercept. It also improves the robustness of matrix subsetting. A review comment recommends using the component variable instead of hardcoding $conditional to ensure consistency with other sections of the code.

Comment thread R/check_collinearity.R Outdated
@jmgirard jmgirard marked this pull request as draft April 23, 2026 17:40
@jmgirard
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the collinearity check logic to better handle ordinal models (clm/clmm) by correctly filtering slope parameters. The reviewer noted that the current implementation is susceptible to issues with rank-deficient models because it relies on names(x$beta) and manual model matrix construction. It is recommended to use insight::find_parameters(x)$conditional and insight::get_modelmatrix(x) to ensure robust alignment of parameters and term assignments.

Comment thread R/check_collinearity.R
@jmgirard
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the check_collinearity.zerocount function to improve the handling of ordinal models, specifically clm and clmm objects. The changes introduce a more robust method for identifying and filtering slope parameters by matching them against the model matrix and variance-covariance matrix, ensuring that term assignments are correctly synchronized. This logic replaces the previous simplified intercept-removal code. I have no feedback to provide.

@jmgirard jmgirard marked this pull request as ready for review April 23, 2026 19:08
@jmgirard
Copy link
Copy Markdown
Contributor Author

@mattansb, your review would be great!

@jmgirard
Copy link
Copy Markdown
Contributor Author

Hoping to see a low VIF like ~1.12 rather than a high one like 4.32

library(ordinal)
library(lme4)
#> Loading required package: Matrix
library(performance) # with the changes

set.seed(999)
n <- 500

# 1. Simulate perfectly orthogonal predictors
x_continuous <- rnorm(n, mean = 0, sd = 1)
x_binary <- sample(c(-0.5, 0.5), size = n, replace = TRUE, prob = c(0.85, 0.15)) 
subject_id <- factor(rep(1:50, each = 10))

# 2. Generate an ordinal outcome with MANY categories
random_intercepts <- rnorm(50, 0, 1)
latent_y <- 2 * x_continuous + 3 * x_binary + random_intercepts[as.numeric(subject_id)] + rlogis(n)

# Cut into 15 categories to generate 14 distinct thresholds
y_ordinal <- cut(
  latent_y, 
  breaks = 15, 
  ordered_result = TRUE
)

dat <- data.frame(y_ordinal, x_continuous, x_binary, subject_id)

# 3. Fit models
mod_lmer <- lmer(as.numeric(y_ordinal) ~ x_continuous + x_binary + (1 | subject_id), data = dat)
mod_clmm <- clmm(y_ordinal ~ x_continuous + x_binary + (1 | subject_id), data = dat)

# 4. Compare Collinearity Checks
check_collinearity(mod_lmer)
#> # Check for Multicollinearity
#> 
#> Low Correlation
#> 
#>          Term  VIF  VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
#>  x_continuous 1.00 [1.00, Inf]     1.00      1.00     [0.00, 1.00]
#>      x_binary 1.00 [1.00, Inf]     1.00      1.00     [0.00, 1.00]
check_collinearity(mod_clmm)
#> # Check for Multicollinearity
#> 
#> Low Correlation
#> 
#>          Term  VIF   VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
#>  x_continuous 1.12 [1.05, 1.29]     1.06      0.89     [0.78, 0.95]
#>      x_binary 1.12 [1.05, 1.29]     1.06      0.89     [0.78, 0.95]

Created on 2026-04-23 with reprex v2.1.1

@strengejacke
Copy link
Copy Markdown
Member

Looks good! Could you possibly a) add a news item, b) increase version number, and c) if possible, add a test (e.g. based on your example)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

check_collinearity() returns artificially inflated VIFs for ordinal::clmm models

2 participants